43 research outputs found

    DivGraphPointer: A Graph Pointer Network for Extracting Diverse Keyphrases

    Full text link
    Keyphrase extraction from documents is useful to a variety of applications such as information retrieval and document summarization. This paper presents an end-to-end method called DivGraphPointer for extracting a set of diversified keyphrases from a document. DivGraphPointer combines the advantages of traditional graph-based ranking methods and recent neural network-based approaches. Specifically, given a document, a word graph is constructed from the document based on word proximity and is encoded with graph convolutional networks, which effectively capture document-level word salience by modeling long-range dependency between words in the document and aggregating multiple appearances of identical words into one node. Furthermore, we propose a diversified point network to generate a set of diverse keyphrases out of the word graph in the decoding process. Experimental results on five benchmark data sets show that our proposed method significantly outperforms the existing state-of-the-art approaches.Comment: Accepted to SIGIR 201

    Hardware Acceleration for Similarity Measurement in Natural Language Processing

    Get PDF
    Abstract-The continuation of Moore's law scaling, but in the absence of Dennard scaling, motivates an emphasis on energyefficient accelerator-based designs for future applications. In natural language processing, the conventional approach to automatically analyze vast text collections-using scale-out processingincurs high energy and hardware costs since the central computeintensive step of similarity measurement often entails pair-wise, allto-all comparisons. We propose a custom hardware accelerator for similarity measures that leverages data streaming, memory latency hiding, and parallel computation across variable-length threads. We evaluate our design through a combination of architectural simulation and RTL synthesis. When executing the dominant kernel in a semantic indexing application for documents, we demonstrate throughput gains of up to 42Ă— and 58Ă— lower energy per similaritycomputation compared to an optimized software implementation, while requiring less than 1.3% of the area of a conventional core

    Using Collective Discourse to Generate Surveys of Scientific Paradigms.

    Full text link
    This thesis is focused on understanding collective discourse and employing its properties to build better decision support systems. We first define collective discourse as a collective human behavior in content generation. In social media, collective discourse is often a collective reaction to an event. A collective reaction to a well-defined subject emerges in response to an event (a movie release, a breaking story, a newly published paper) in the form of independent writings (movie reviews, news headlines, citation sentences) by many individuals. In order to understand collective discourse, we perform our analysis on a wide range of real-world datasets from citations to movie reviews. We show that all these datasets exhibit diversity of perspective, a property seen in other collective systems and a criterion in wise crowds. Our experiments also confirm that the network of different perspective co-occurrences exhibits the small-world property with high clustering of different perspectives. Finally, we show that non-expert contributions in collective discourse can be used to answer simple questions that are otherwise hard to answer. As a concrete example of collective discourse, we discuss citations to scholarly work. We show how they contain important information that convey the key features and basic underpinnings of a particular field, early and late developments, important contributions, and basic definitions and examples that enable rapid understanding of a field by non-experts. We then present C-LexRank, a system that exploits scientific collective discourse to produce automatically generated, readily consumable technical surveys. Finally, we further extend our experiments to summarize an entire scientific topic. We generate extractive surveys of a set of Question Answering (QA) and Dependency Parsing (DP) papers, their abstracts, and their citation sentences and show that citations have unique survey-worthy information.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/95960/1/vahed_1.pd

    Abstract

    No full text
    Blogs form a large social network, and their analysis are becoming an important research area today. Blogs are growing rapidly in the Internet, because bloggers can rapidly change the content and linking patterns of them. Visitors of blogs may comment on the postings of a blog, and this leads to a complex interaction between groups of bloggers. One of the interesting phenomenon in blog space is “blogger failure” when a blogger stops writing after a certain amount of time and will not return to blogspace for a long time, or when a blogger does get any comment from her audience. In this paper we illustrate our observation on bloggers failure in a unique blogspace. First, we introduce, PersianBlog blogspace and dataset, and then we will describe our observations in commenting behaviors of bloggers. Finally, we will provide our definition of failure, and give a broad future research path to bring out a model for this phenomenon
    corecore